The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.
translated by 谷歌翻译
Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effective downstream fine-tuning. To perform efficient multitask-inference in the same batch, parameter-efficient fine-tuning methods such as prompt tuning have been proposed. However, the existing prompt tuning methods may lack generalization. We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning. The novel component of SPT is a memory bank from where memory prompts are retrieved based on discrete prompts. Extensive experiments, such as (i) fine-tuning a full language model with SPT on 31 different tasks from 8 different domains and evaluating zero-shot generalization on 9 heldout datasets under 5 NLP task categories and (ii) pretraining SPT on the GLUE datasets and evaluating fine-tuning on the SuperGLUE datasets, demonstrate effectiveness of SPT.
translated by 谷歌翻译
Multimodal image-text models have shown remarkable performance in the past few years. However, evaluating their robustness against distribution shifts is crucial before adopting them in real-world applications. In this paper, we investigate the robustness of 9 popular open-sourced image-text models under common perturbations on five tasks (image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation). In particular, we propose several new multimodal robustness benchmarks by applying 17 image perturbation and 16 text perturbation techniques on top of existing datasets. We observe that multimodal models are not robust to image and text perturbations, especially to image perturbations. Among the tested perturbation methods, character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data. We also introduce two new robustness metrics (MMI and MOR) for proper evaluations of multimodal models. We hope our extensive study sheds light on new directions for the development of robust multimodal models.
translated by 谷歌翻译
Establishing open and general benchmarks has been a critical driving force behind the success of modern machine learning techniques. As machine learning is being applied to broader domains and tasks, there is a need to establish richer and more diverse benchmarks to better reflect the reality of the application scenarios. Graph learning is an emerging field of machine learning that urgently needs more and better benchmarks. To accommodate the need, we introduce Graph Learning Indexer (GLI), a benchmark curation platform for graph learning. In comparison to existing graph learning benchmark libraries, GLI highlights two novel design objectives. First, GLI is designed to incentivize \emph{dataset contributors}. In particular, we incorporate various measures to minimize the effort of contributing and maintaining a dataset, increase the usability of the contributed dataset, as well as encourage attributions to different contributors of the dataset. Second, GLI is designed to curate a knowledge base, instead of a plain collection, of benchmark datasets. We use multiple sources of meta information to augment the benchmark datasets with \emph{rich characteristics}, so that they can be easily selected and used in downstream research or development. The source code of GLI is available at \url{https://github.com/Graph-Learning-Benchmarks/gli}.
translated by 谷歌翻译
Models should be able to adapt to unseen data during test-time to avoid performance drops caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called \textit{Data-efficient Prompt Tuning} (DePT) with two key ingredients. First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation. We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective. Second, DePT bootstraps the source representation to the target domain by memory bank-based online pseudo-labeling. A hierarchical self-supervised regularization specially designed for prompts is jointly optimized to alleviate error accumulation during self-training. With much fewer tunable parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks VisDA-C, ImageNet-C, and DomainNet-126, but also superior data efficiency, i.e., adaptation with only 1\% or 10\% data without much performance degradation compared to 100\% data. In addition, DePT is also versatile to be extended to online or multi-source TTA settings.
translated by 谷歌翻译
尽管更多的层和更多的参数通常提高了模型的准确性,但是这样的大型模型通常具有较高的计算复杂性,并且需要大记忆,这超过了小型设备进行推理的容量,并且会产生长时间的训练时间。此外,即使在高性能服务器中,也很难负担长期训练时间和大型模型的推理时间。作为将大型深层模型(教师模型)压缩为紧凑模型(学生模型)的有效方法,知识蒸馏是一种与大型模型打交道的有前途的方法。现有的知识蒸馏方法无法利用可用的弹性计算资源,并对应于低效率。在本文中,我们提出了一个用于知识蒸馏的弹性深度学习框架,即EDL-DIST。 EDL-DIST的优势是三倍。首先,推论和训练过程是分开的。其次,可以利用弹性可用的计算资源来提高效率。第三,支持训练和推理过程的故障耐受性。我们进行了广泛的实验,以表明EDL-DIST的吞吐量比基线方法(在线知识蒸馏)快3.125倍,而精度相似或更高。
translated by 谷歌翻译
从传统上讲,地球系统(例如天气和气候)的预测依赖于具有复杂物理模型的数值模拟,因此在计算中既昂贵又对领域专业知识的需求既昂贵。在过去十年中时空地球观察数据的爆炸性增长中,应用深度学习(DL)的数据驱动模型表明了各种地球系统预测任务的潜力。尽管在其他领域取得了广泛的成功,但作为新兴DL架构的变压器在该领域的采用量有限。在本文中,我们提出了Earthformer,这是一种用于地球系统预测的时空变压器。 Earthformer基于一个通用,灵活和有效的时空注意块,名为Cuboid的注意力。这个想法是将数据分解为立方体,并平行应用立方体级别的自我注意力。这些立方体与全球矢量的集合进一步相关。我们对MovingMnist数据集和新提出的混沌N体MNIST数据集进行了实验,以验证Cuboid注意的有效性,并找出地球形式的最佳设计。关于降水现象和El Nino/Southern振荡(ENSO)预测的两个现实基准测试的实验表明,Earthformer实现了最新的性能。
translated by 谷歌翻译
对抗性训练(AT)捍卫深层神经网络免受对抗攻击。限制其实际应用的一个挑战是对干净样品的性能降解。以前的作品确定的主要瓶颈是广泛使用的批准化(BN),它努力为AT中的清洁和对抗训练样本的不同统计数据建模。尽管主要的方法是扩展BN以捕获这种分布的混合物,但我们建议通过去除AT中的所有BN层来完全消除这种瓶颈。我们的无标准器稳健训练(NOFROST)方法将无标准器网络的最新进展扩展到了AT,因为它在处理混合分配挑战方面未开发优势。我们表明,Nofrost在干净的样品准确性上只有轻微的牺牲才能实现对抗性的鲁棒性。在具有RESNET50的Imagenet上,Nofrost可实现$ 74.06 \%$清洁精度,从标准培训中降低了$ 2.00 \%$。相比之下,基于BN的基于BN的$ 59.28 \%$清洁准确性,从标准培训中获得了$ 16.78 \%$的大幅下降。此外,Nofrost在PGD Attack上达到了23.56美元的$ 23.56 \%$的对抗性,这提高了基于BN AT的13.57美元\%$ $鲁棒性。我们观察到更好的模型平滑度和来自Nofrost的较大决策边缘,这使得模型对输入扰动的敏感程度降低,从而更加健壮。此外,当将更多的数据增强纳入NOFROST时,它可以针对多个分配变化实现全面的鲁棒性。代码和预训练的模型在https://github.com/amazon-research/normalizer-free-robust-training上公开。
translated by 谷歌翻译
大规模预训练的语言模型的出现为自然语言处理的最新进展做出了巨大贡献。许多最先进的语言模型首先在大型文本语料库上进行培训,然后在下游任务上进行微调。尽管它最近获得了成功和广泛的采用,但对预训练的语言模型的微调通常会遭受过度拟合,这会导致由于模型的复杂性极高的复杂性和下游任务的有限培训样本而导致的普遍性差。为了解决这个问题,我们提出了一个新颖有效的微调框架,称为Layerwise噪声稳定性正则化(LNSR)。具体而言,我们建议注入标准的高斯噪声或势内噪声,并将微调模型的隐藏表示形式定向。我们首先提供理论分析以支持我们方法的功效。然后,我们证明了所提出的方法的优势,而不是其他最先进的算法,包括L2-SP,MixOut和Smart。尽管这些先前的作品仅验证其方法对相对简单的文本分类任务的有效性,但我们还验证了方法对问题答案任务的有效性,而目标问题更加困难,并且可以使用更多的培训示例。此外,广泛的实验结果表明,所提出的算法不仅可以提高语言模型的内域性能,而且还可以改善域外数据的域概括性能。
translated by 谷歌翻译
联合学习(FL)是一种机器学习技术,它使参与者能够在不交换私人数据的情况下协作培训高质量的模型。利用跨索洛FL(CS-FL)设置的参与者是具有不同任务需求的独立组织,他们不仅关心数据隐私,而且由于知识产权的考虑而独立培训其独特的模型。大多数现有的FL方法无法满足上述方案。在本文中,我们提出了一种基于未标记数据的伪标记的FL方法,该方法是通过诸如辅助的过程。据我们所知,这是第一种与异质任务,异质模型和异质培训算法同时兼容的第一种FL方法。实验结果表明,所提出的方法比竞争能力更好。对于非独立和相同分布的(IID)设置和异质模型而言,尤其如此,其中提出的方法可实现35%的性能提高。
translated by 谷歌翻译